Beware of Machine Learning-Based Scoring Functions - On the Danger of Developing Black Boxes

نویسندگان

  • Joffrey Gabel
  • Jérémy Desaphy
  • Didier Rognan
چکیده

Training machine learning algorithms with protein-ligand descriptors has recently gained considerable attention to predict binding constants from atomic coordinates. Starting from a series of recent reports stating the advantages of this approach over empirical scoring functions, we could indeed reproduce the claimed superiority of Random Forest and Support Vector Machine-based scoring functions to predict experimental binding constants from protein-ligand X-ray structures of the PDBBind dataset. Strikingly, these scoring functions, trained on simple protein-ligand element-element distance counts, were almost unable to enrich virtual screening hit lists in true actives upon docking experiments of 10 reference DUD-E datasets; this is a a feature that, however, has been verified for an a priori less-accurate empirical scoring function (Surflex-Dock). By systematically varying ligand poses from true X-ray coordinates, we show that the Surflex-Dock scoring function is logically sensitive to the quality of docking poses. Conversely, our machine-learning based scoring functions are totally insensitive to docking poses (up to 10 Å root-mean square deviations) and just describe atomic element counts. This report does not disqualify using machine learning algorithms to design scoring functions. Protein-ligand element-element distance counts should however be used with extreme caution and only applied in a meaningful way. To avoid developing novel but meaningless scoring functions, we propose that two additional benchmarking tests must be systematically done when developing novel scoring functions: (i) sensitivity to docking pose accuracy, and (ii) ability to enrich hit lists in true actives upon structure-based (docking, receptor-ligand pharmacophore) virtual screening of reference datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Body Mass Index Classification based on Facial Features using Machine Learning Algorithms for utilizing in Telemedicine

Background and Objectives: Due to the impact of controlling BMI on life, BMI classification based on facial features can be used for developing Telemedicine systems and eliminating the limitations of measuring tools, especially for paralyzed people. So that physicians can help people online during the Covid-19 pandemic. Method: In this study, new features and some previous work features were e...

متن کامل

Transparent Machine Learning Algorithm Offers Useful Prediction Method for Natural Gas Density

Machine-learning algorithms aid predictions for complex systems with multiple influencing variables. However, many neural-network related algorithms behave as black boxes in terms of revealing how the prediction of each data record is performed. This drawback limits their ability to provide detailed insights concerning the workings of the underlying system, or to relate predictions to specific ...

متن کامل

Investigating the performance of machine learning-based methods in classroom reverberation time estimation using neural networks (Research Article)

Classrooms, as one of the most important educational environments, play a major role in the learning and academic progress of students. reverberation time, as one of the most important acoustic parameters inside rooms, has a significant effect on sound quality. The inefficiency of classical formulas such as Sabin, caused this article to examine the use of machine learning methods as an alternat...

متن کامل

Forecasting the Tehran Stock market by Machine ‎Learning Methods using a New Loss Function

Stock market forecasting has attracted so many researchers and investors that ‎many studies have been done in this field. These studies have led to the ‎development of many predictive methods, the most widely used of which are ‎machine learning-based methods. In machine learning-based methods, loss ‎function has a key role in determining the model weights. In this study a new loss ‎function is ...

متن کامل

A New Fuzzy Stabilizer Based on Online Learning Algorithm for Damping of Low-Frequency Oscillations

A multi objective Honey Bee Mating Optimization (HBMO) designed by online learning mechanism is proposed in this paper to optimize the double Fuzzy-Lead-Lag (FLL) stabilizer parameters in order to improve low-frequency oscillations in a multi machine power system. The proposed double FLL stabilizer consists of a low pass filter and two fuzzy logic controllers whose parameters can be set by the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 54 10  شماره 

صفحات  -

تاریخ انتشار 2014